In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import warnings
import difflib
warnings.filterwarnings('ignore')
In [2]:
students_ireland = pd.read_excel('Data.xlsx', sheet_name='Students_Ireland')
parents_ireland = pd.read_excel('Data.xlsx', sheet_name='Parents_Ireland')
students_india = pd.read_excel('Data.xlsx', sheet_name='Students_India')
parents_india = pd.read_excel('Data.xlsx', sheet_name='Parents_India')
In [3]:
# Add country and group labels
students_ireland['Country'] = 'Ireland'
students_ireland['Group'] = 'Student'
parents_ireland['Country'] = 'Ireland'
parents_ireland['Group'] = 'Parent'
students_india['Country'] = 'India'
students_india['Group'] = 'Student'
parents_india['Country'] = 'India'
parents_india['Group'] = 'Parent'
In [4]:
students_india.head()
Out[4]:
Id Start time Completion time Email Name What age are you?\n Are you a boy or a girl?\n Would you like to go to college when you finish secondary school? \n\n What would you like to be when you are older? How confident are you in your coding skills? ... Which STEM area would you prefer for your further education? Compared to most of your other school subjects, how good do you think you’d be at coding? 1 on the scale means not good at all compared to your other school subjects and 7 on the scale means much bett I think that girls usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very I think that boys usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very I think that children who are neither boys or girls do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by putting a tick in the box, where 0 is not true What do you think you can do if you are a coder? \n Have you ever taken any computer science courses in the past? (if yes, write in "other" the age you were when you did the computer science course) Have you ever taken any coding courses in the past? (if yes, write in "other" the age you were when you did the coding course AND write the language(s), technologies, software learned) Country Group
0 1 2025-07-17 14:41:31 2025-07-17 14:48:00 anonymous NaN 14 Girl Yes Not Decided A little confident ... Science (e.g., Biology, Chemistry, Physics) 3 100 100.0 0 I can make my own games and websites. No Scratch India Student
1 2 2025-07-17 14:58:00 2025-07-17 15:01:01 anonymous NaN 18 Boy Yes Not sure Quite a bit confident ... Engineering (e.g., Mechanical, Electrical, Civil) 6 50 50.0 100 Lots of things 11 11 - Python, Java, etc India Student
2 3 2025-07-17 14:59:16 2025-07-17 15:02:15 anonymous NaN 14 Boy Yes Not sure Not at all confident ... Mathematics (e.g., Statistics, Applied Math, D... 3 25 65.0 50 Not sure No No India Student
3 4 2025-07-17 14:59:37 2025-07-17 15:07:44 anonymous NaN 11 Boy Yes IAS officer A little confident ... Science (e.g., Biology, Chemistry, Physics) 4 90 100.0 0 Build games and educational sites No No India Student
4 5 2025-07-17 15:07:55 2025-07-17 15:17:35 anonymous NaN 12 Girl Yes Actress Not at all confident ... Arts Not good at all 1 80 95.0 75 hacking No No India Student

5 rows × 50 columns

In [5]:
students_ireland.head()
Out[5]:
Sheet Are you a boy or a girl Would you like to go to college when you finish secondary school? What would you like to be when you are older? How confident are you in your computer skills? How confident are you in your coding skills? How interested are you in science and technology? How interested are you in maths? timestamp Source of dataset Camp Country Group
0 1 1 1 Something to do with technology/writing 3 3 3 4 01.07.2024 at 11:22:21 O 1 Ireland Student
1 2 1 1 Electrical engineer 3 4 1 2 01.07.2024 at 11:23:10 O 1 Ireland Student
2 3 2 1 Doctor/ medicinal scientist 2 3 1 3 01.07.2024 at 11:24:33 O 1 Ireland Student
3 4 2 1 A primary school teacher 3 3 2 2 01.07.2024 at 11:25:32 O 1 Ireland Student
4 5 1 1 Pro footballer 4 4 3 4 01.07.2024 at 11:26:11 O 1 Ireland Student
In [6]:
parents_india.head()
Out[6]:
Id Start time Completion time Email Name Do you have any of the following types of technology in the home? (Please choose as many as apply) Do you think that technology helps your child’s learning? If yes, how? If not, why not? \n Do you have a qualification in a science, technology, engineering or mathematics (STEM) field? If yes, what is your qualification? Which sector do you work in? ... Please select the option on the scale below that reflects your opinion on the following statements: \n.To do well in coding, my child has to try I think that girls usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very I think that boys usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very Finally, listed there are 5 STEM subjects listed below. For each one please indicate how suitable you think men or women are for each subject by checking the appropriate box. There are no right or wro Finally, listed there are 5 STEM subjects listed below. For each one please indicate how suitable you think men or women are for each subject by checking the appropriate box. There are no right or wro1 Finally, listed there are 5 STEM subjects listed below. For each one please indicate how suitable you think men or women are for each subject by checking the appropriate box. There are no right or wro2 Finally, listed there are 5 STEM subjects listed below. For each one please indicate how suitable you think men or women are for each subject by checking the appropriate box. There are no right or wro3 Finally, listed there are 5 STEM subjects listed below. For each one please indicate how suitable you think men or women are for each subject by checking the appropriate box. There are no right or wro4 Country Group
0 1 2025-07-17 14:37:23 2025-07-17 14:40:16 anonymous NaN Smartphone;iPad/Tablet;Laptop Computer;3/4G In... Yes- integral part of learning these days Yes BE Business ... A little 7 100 100 Neutral 4 Neutral 4 Neutral 4 Neutral 4 Neutral 4 India Parent
1 2 2025-07-17 14:40:39 2025-07-17 14:42:43 anonymous NaN Smartphone;iPad/Tablet;Laptop Computer;WiFi In... Yes thru access to the web. Yes BE, MBA Business ... 3 50 50 Neutral 4 Neutral 4 Neutral 4 3 5 India Parent
2 3 2025-07-17 14:54:34 2025-07-17 14:57:01 anonymous NaN Smartphone;iPad/Tablet;Laptop Computer;3/4G In... Yes. Provides ready access to a wide variety o... No NaN Business ... 7 50 70 Neutral 4 Neutral 4 Neutral 4 Neutral 4 Neutral 4 India Parent
3 4 2025-07-17 14:53:11 2025-07-17 15:04:23 anonymous NaN Smartphone;iPad/Tablet;Laptop Computer;3/4G In... Yes, Convenient and vast researches can be con... Yes Science in school till grade 12th Self employed ... 4 100 100 Neutral 4 Neutral 4 Neutral 4 Neutral 4 Neutral 4 India Parent
4 5 2025-07-17 14:59:48 2025-07-17 15:07:19 anonymous NaN Smartphone;iPad/Tablet;Laptop Computer;Desktop... Technology is a two sided coin! Has it’s pros ... Yes Had done science with maths and later took scu... Business ... 4 70 95 Neutral 4 Neutral 4 Neutral 4 Neutral 4 Neutral 4 India Parent

5 rows × 25 columns

In [7]:
parents_ireland.head()
Out[7]:
Sheet Smartphone iPad/Tablet Laptop computer Desktop computer 3/4g internet Wifi internet Do you think that technology helps your child's learning? If yes, how? If not, why not? Do you have a qualification in a science, tech0logy, engineering or mathematics (STEM) field? If yes, what is your qualification? ... Science Technology Maths Engineering Computing timestamp Source of dataset Camp Country Group
0 1 1 1 1 0 0 1 yes, it gives them information they're looking... 1 BSC in physics ... 4.0 4.0 4.0 4.0 4.0 08.08.2024 at 14:37:21 O 1.0 Ireland Parent
1 2 1 1 1 0 1 1 Yes, technology helps to create learning envir... 1 Masters in computer science ... 4.0 4.0 4.0 4.0 4.0 08.08.2024 at 14:40:00 O 1.0 Ireland Parent
2 3 1 1 1 0 0 1 Yes helps with research, awareness 0 NaN ... 4.0 4.0 4.0 4.0 4.0 08.08.2024 at 14:42:46 O 1.0 Ireland Parent
3 4 1 1 1 0 1 1 NaN 0 NaN ... 5.0 4.0 4.0 3.0 4.0 08.08.2024 at 14:44:58 O 1.0 Ireland Parent
4 5 1 1 1 1 1 1 It is a great way to explore information 1 Masters of engineering science ... 4.0 4.0 4.0 4.0 4.0 08.08.2024 at 14:47:31 O 1.0 Ireland Parent

5 rows × 29 columns

In [8]:
## Column Matching Utility
# Helper code to find matching column names in datasets.
def find_column(df, search):
    for col in df.columns:
        if search.strip().lower() in col.strip().lower():
            return col
    matches = difflib.get_close_matches(search.strip(), df.columns, n=1, cutoff=0.6)
    if matches:
        return matches[0]
    raise KeyError(f"Column '{search}' not found in DataFrame. Available columns: {list(df.columns)}")
In [9]:
## Calculate Student TAM Scores
# Calculate TAM scores for students in Ireland and India.

# Mapping for India student data
confidence_map = {
    'Not at all confident': 1, 
    'A little confident': 2, 
    'Quite a bit confident': 3, 
    'Really confident': 4
}
interest_map = {
    'Not at all interested': 1, 
    'A little interested': 2, 
    'Quite a bit interested': 3, 
    'Really interested': 4
}
In [10]:
# Find columns for India students
try:
    coding_conf_col = find_column(students_india, 'How confident are you in your coding skills')
    computer_conf_col = find_column(students_india, 'How confident are you in your computer skills')
    interest_stem_col = find_column(students_india, 'How interested are you in science and technology')
    interest_maths_col = find_column(students_india, 'How interested are you in maths')
    gender_col_india = find_column(students_india, 'Are you a boy or a girl')
except KeyError as e:
    print(f"Error in India students column mapping: {e}")
    raise
In [11]:
# Map India student responses
students_india['coding_confidence_num'] = students_india[coding_conf_col].map(confidence_map)
students_india['computer_confidence_num'] = students_india[computer_conf_col].map(confidence_map)
students_india['interest_stem_num'] = students_india[interest_stem_col].map(interest_map)
students_india['interest_maths_num'] = students_india[interest_maths_col].map(interest_map)
In [12]:
# Calculate TAM scores for India students
students_india['PU_Score'] = students_india[['interest_stem_num', 'interest_maths_num']].mean(axis=1)
students_india['PEOU_Score'] = students_india[['coding_confidence_num', 'computer_confidence_num']].mean(axis=1)
students_india['Attitude_Score'] = students_india[['interest_stem_num', 'interest_maths_num']].mean(axis=1)

# Map gender
gender_map = {'Boy': 1, 'Girl': 2, 'Neither': 3}
students_india['Gender_Num'] = students_india[gender_col_india].map(gender_map)
In [13]:
# Find columns for Ireland students
try:
    interest_stem_col_ie = find_column(students_ireland, 'How interested are you in science and technology')
    interest_maths_col_ie = find_column(students_ireland, 'How interested are you in maths')
    computer_conf_col_ie = find_column(students_ireland, 'How confident are you in your computer skills')
    coding_conf_col_ie = find_column(students_ireland, 'How confident are you in your coding skills')
    gender_col_ie = find_column(students_ireland, 'Are you a boy or a girl')
except KeyError as e:
    print(f"Error in Ireland students column mapping: {e}")
    raise
In [14]:
# Calculate TAM scores for Ireland students
students_ireland['PU_Score'] = students_ireland[[interest_stem_col_ie, interest_maths_col_ie]].mean(axis=1)
students_ireland['PEOU_Score'] = students_ireland[[computer_conf_col_ie, coding_conf_col_ie]].mean(axis=1)
students_ireland['Attitude_Score'] = students_ireland[[interest_stem_col_ie, interest_maths_col_ie]].mean(axis=1)

# Map gender for Ireland
students_ireland['Gender_Num'] = students_ireland[gender_col_ie].map(gender_map)
In [15]:
## Calculate Parent TAM Scores
# Calculate TAM scores for parents in Ireland and India.

# Ireland Parents TAM
parents_ireland['PU_Score'] = parents_ireland['My child finds technology'].fillna(4)
parents_ireland['PEOU_Score'] = parents_ireland['My child finds coding'].fillna(4)

# India Parents TAM mapping
india_tech_map = {
    'Very Boring': 1, 'Somewhat Boring': 2, 'Neither boring Nor Interesting': 3,
    'Somewhat Interesting': 4, 'Interesting': 5, 'Very Interesting': 6
}
In [16]:
# Find columns for India parents
try:
    tech_col_india = find_column(parents_india, 'My child finds technology')
    coding_col_india = find_column(parents_india, 'My child finds coding')
except KeyError as e:
    print(f"Error in India parents column mapping: {e}")
    raise

# Map India parent responses
parents_india['tech_interest_num'] = parents_india[tech_col_india].map(india_tech_map)
parents_india['coding_interest_num'] = parents_india[coding_col_india].map(india_tech_map)

# Calculate TAM scores for India parents
parents_india['PU_Score'] = parents_india['tech_interest_num'].fillna(4)
parents_india['PEOU_Score'] = parents_india['coding_interest_num'].fillna(4)
In [17]:
## Combine Datasets
# Combine student and parent datasets for analysis.
students_combined = pd.concat([
    students_ireland[['PU_Score', 'PEOU_Score', 'Attitude_Score', 'Country', 'Group', 'Gender_Num']],
    students_india[['PU_Score', 'PEOU_Score', 'Attitude_Score', 'Country', 'Group', 'Gender_Num']]
], ignore_index=True)

parents_combined = pd.concat([
    parents_ireland[['PU_Score', 'PEOU_Score', 'Country', 'Group']],
    parents_india[['PU_Score', 'PEOU_Score', 'Country', 'Group']]
], ignore_index=True)
In [18]:
## TAM Comparison Chart
# Visualize average TAM scores for students and parents by country.
# Students comparison
student_means = students_combined.groupby('Country')[['PU_Score', 'PEOU_Score', 'Attitude_Score']].mean().reset_index()
student_melted = pd.melt(student_means, id_vars=['Country'], value_vars=['PU_Score', 'PEOU_Score', 'Attitude_Score'], 
                         var_name='TAM_Dimension', value_name='Score')
fig = px.bar(student_melted, x='Country', y='Score', color='TAM_Dimension', barmode='group',
             title='Student TAM Scores by Country', labels={'Score': 'Average Score'})
fig.show()
# Parents comparison
parent_means = parents_combined.groupby('Country')[['PU_Score', 'PEOU_Score']].mean().reset_index()
parent_melted = pd.melt(parent_means, id_vars=['Country'], value_vars=['PU_Score', 'PEOU_Score'],
                        var_name='TAM_Dimension', value_name='Score')
fig = px.bar(parent_melted, x='Country', y='Score', color='TAM_Dimension', barmode='group',
             title='Parent TAM Scores by Country', labels={'Score': 'Average Score'})
fig.show()
In [19]:
## Gender Stereotype Heatmap
# Visualize parent perceptions of gender abilities in technology.

# Define exact column names for gender stereotype analysis
ireland_girls_col = 'I think that girls usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very much true'
ireland_boys_col = 'I think that boys usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very much true'
india_girls_col = 'I think that girls usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very'
india_boys_col = 'I think that boys usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very '

Parent Perceptions of Gender and Technology¶

This section analyzes how parents in Ireland and India perceive the technological abilities of boys and girls. The scores represent average agreement (0-100 scale) with statements about gender and technology ability. These results are visualized in a heatmap to highlight differences and potential biases.

In [20]:
# Gender Stereotype Analysis: Use only relevant columns for each dataset
ireland_girls_col = 'I think that girls usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very much true'
ireland_boys_col = 'I think that boys usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very much true'
india_girls_col = 'I think that girls usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very'
india_boys_col = 'I think that boys usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very '

# Check Ireland columns
for col in [ireland_girls_col, ireland_boys_col]:
    if col not in parents_ireland.columns:
        print(f"Warning: Expected column '{col}' not found in Parents_Ireland dataset")

# Check India columns
for col in [india_girls_col, india_boys_col]:
    if col not in parents_india.columns:
        print(f"Warning: Expected column '{col}' not found in Parents_India dataset")

# Analysis: Calculate correlation matrices for gender stereotype columns in both datasets
# and compare the strength and direction of correlations

# Select only the relevant columns for correlation analysis
ireland_corr_data = parents_ireland[[ireland_girls_col, ireland_boys_col]]
india_corr_data = parents_india[[india_girls_col, india_boys_col]]

# Calculate correlation matrices
ireland_corr = ireland_corr_data.corr()
india_corr = india_corr_data.corr()

# Print correlation matrices
print("\nIreland Parents - Gender Stereotype Correlation Matrix:")
print(ireland_corr)
print("\nIndia Parents - Gender Stereotype Correlation Matrix:")
print(india_corr)

# Analysis: Visualize the distribution of gender stereotype scores in both datasets
import matplotlib.pyplot as plt
import seaborn as sns

# Set up the matplotlib figure
plt.figure(figsize=(12, 6))

# Ireland distribution
plt.subplot(1, 2, 1)
sns.histplot(parents_ireland[ireland_girls_col], kde=True, color='blue', label='Girls', bins=30)
sns.histplot(parents_ireland[ireland_boys_col], kde=True, color='red', label='Boys', bins=30)
plt.title('Ireland Parents - Gender Stereotype Scores Distribution')
plt.xlabel('Stereotype Score')
plt.ylabel('Frequency')
plt.legend()

# India distribution
plt.subplot(1, 2, 2)
sns.histplot(parents_india[india_girls_col], kde=True, color='blue', label='Girls', bins=30)
sns.histplot(parents_india[india_boys_col], kde=True, color='red', label='Boys', bins=30)
plt.title('India Parents - Gender Stereotype Scores Distribution')
plt.xlabel('Stereotype Score')
plt.ylabel('Frequency')
plt.legend()

# Show the plots
plt.tight_layout()
plt.show()
Ireland Parents - Gender Stereotype Correlation Matrix:
                                                    I think that girls usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very much true  \
I think that girls usually do well in technolog...                                           1.000000                                                                                                                                                                    
I think that boys usually do well in technology...                                           0.881595                                                                                                                                                                    

                                                    I think that boys usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very much true  
I think that girls usually do well in technolog...                                           0.881595                                                                                                                                                                  
I think that boys usually do well in technology...                                           1.000000                                                                                                                                                                  

India Parents - Gender Stereotype Correlation Matrix:
                                                    I think that girls usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very  \
I think that girls usually do well in technolog...                                           1.000000                                                                                                                                                          
I think that boys usually do well in technology...                                           0.878423                                                                                                                                                          

                                                    I think that boys usually do well in technology. Please rate how much you agree with this statement on a scale from 0 to 100 by writing a number in the box, where 0 is not true at all and 100 is very   
I think that girls usually do well in technolog...                                           0.878423                                                                                                                                                         
I think that boys usually do well in technology...                                           1.000000                                                                                                                                                         
No description has been provided for this image
In [21]:
# Calculate means with NaN handling
ireland_girls = pd.to_numeric(parents_ireland[ireland_girls_col], errors='coerce').mean()
ireland_boys = pd.to_numeric(parents_ireland[ireland_boys_col], errors='coerce').mean()
india_girls = pd.to_numeric(parents_india[india_girls_col], errors='coerce').mean()
india_boys = pd.to_numeric(parents_india[india_boys_col], errors='coerce').mean()

# Create gender_data DataFrame
gender_data = pd.DataFrame({
    'Country': ['Ireland', 'India'],
    'Girls_Technology_Ability': [ireland_girls, india_girls],
    'Boys_Technology_Ability': [ireland_boys, india_boys]
})
In [22]:
# Melt for Plotly heatmap
gender_melted = pd.melt(gender_data, id_vars=['Country'], value_vars=['Girls_Technology_Ability', 'Boys_Technology_Ability'],
                        var_name='Perception', value_name='Score')
In [23]:
# Create heatmap
fig = px.density_heatmap(gender_melted, x='Perception', y='Country', z='Score', text_auto='.1f',
                         title='Gender Stereotypes in Technology by Country (Parent Perceptions)',
                         labels={'Score': 'Agreement Level (0-100)'}, color_continuous_scale='RdYlBu_r')
fig.update_layout(width=800, height=600)
fig.show()
In [24]:
## Confidence vs Career Interest Scatter
# Visualize relationship between coding confidence and STEM interest.
fig = px.scatter(students_combined, x='PEOU_Score', y='PU_Score', color='Country', size_max=10,
                 title='Technology Confidence vs STEM Interest by Country',
                 labels={'PEOU_Score': 'Perceived Ease of Use (Coding Confidence)', 'PU_Score': 'Perceived Usefulness (STEM Interest)'})
In [25]:
# Add trend line
x = students_combined['PEOU_Score'].dropna()
y = students_combined['PU_Score'].dropna()
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
trend_data = pd.DataFrame({'x': x, 'y': p(x)})
fig.add_scatter(x=trend_data['x'], y=trend_data['y'], mode='lines', name='Trend Line', line=dict(color='red', dash='dash'))
In [26]:
# Add correlation annotation
correlation = stats.pearsonr(x, y)[0]
fig.add_annotation(x=0.05, y=0.95, xref='paper', yref='paper', showarrow=False,
                   text=f'Overall Correlation: {correlation:.3f}', bgcolor='wheat')
fig.show()

## Correlation Matrix
# Visualize correlations between TAM dimensions.
corr_data = students_combined[['PU_Score', 'PEOU_Score', 'Attitude_Score']].corr().reset_index().melt(id_vars='index', var_name='Variable2', value_name='Correlation')

# Create heatmap
fig = px.density_heatmap(corr_data, x='index', y='Variable2', z='Correlation', text_auto='.3f',
                         title='TAM Dimensions Correlation Matrix (All Students)', color_continuous_scale='RdBu',
                         labels={'Correlation': 'Correlation Coefficient', 'index': 'Variable1'})
fig.show()
In [27]:
## Gender Differences Analysis
# Visualize TAM scores by gender and country.
student_gender_data = students_combined.dropna(subset=['Gender_Num'])

# Create box plots
for dim, title in zip(['PU_Score', 'PEOU_Score', 'Attitude_Score'], ['Perceived Usefulness', 'Perceived Ease of Use', 'Attitude']):
    fig = px.box(student_gender_data, x='Gender_Num', y=dim, color='Country',
                 title=f'{title} by Gender and Country',
                 labels={'Gender_Num': 'Gender (1=Boy, 2=Girl)', dim: 'TAM Score'})
    fig.show()
In [28]:
## Clustering Analysis
# Perform K-means clustering on TAM scores and visualize in 3D.
cluster_data = students_combined[['PU_Score', 'PEOU_Score', 'Attitude_Score']].dropna()

# Standardize data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(cluster_data)

# Perform K-means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(scaled_data)

# Add cluster labels
cluster_data_plot = cluster_data.copy()
cluster_data_plot['Cluster'] = clusters

# Create 3D scatter plot
fig = px.scatter_3d(cluster_data_plot, x='PU_Score', y='PEOU_Score', z='Attitude_Score', color='Cluster',
                    title='Student Clusters Based on TAM Scores',
                    labels={'PU_Score': 'Perceived Usefulness', 'PEOU_Score': 'Perceived Ease of Use', 'Attitude_Score': 'Attitude Score'})
fig.show()
In [29]:
## Summary Statistics
# Generate comprehensive summary statistics.

print('=== TAM ANALYSIS SUMMARY REPORT ===\n')

# Sample sizes
print('1. SAMPLE SIZES:')
print(f'   Students Ireland: {len(students_combined[students_combined["Country"]=="Ireland"])}')
print(f'   Students India: {len(students_combined[students_combined["Country"]=="India"])}')
print(f'   Parents Ireland: {len(parents_combined[parents_combined["Country"]=="Ireland"])}')
print(f'   Parents India: {len(parents_combined[parents_combined["Country"]=="India"])}\n')

# TAM scores by country
print('2. AVERAGE TAM SCORES BY COUNTRY (Students):')
student_summary = students_combined.groupby('Country')[['PU_Score', 'PEOU_Score', 'Attitude_Score']].agg(['mean', 'std'])
print(student_summary.round(3))
print()

# Gender stereotype analysis
print('3. GENDER STEREOTYPES IN TECHNOLOGY (Parents):')
print('   Average agreement scores (0-100 scale):')

valid_gender_data = True
for country in ['Ireland', 'India']:
    row = gender_data[gender_data['Country'] == country]
    if row.empty or 'Girls_Technology_Ability' not in row.columns or 'Boys_Technology_Ability' not in row.columns:
        print(f'   {country} - Girls capable: N/A')
        print(f'   {country} - Boys capable: N/A')
        valid_gender_data = False
    else:
        girls_score = row['Girls_Technology_Ability'].iloc[0]
        boys_score = row['Boys_Technology_Ability'].iloc[0]
        print(f'   {country} - Girls capable: {girls_score:.1f}' if not pd.isna(girls_score) else f'   {country} - Girls capable: N/A')
        print(f'   {country} - Boys capable: {boys_score:.1f}' if not pd.isna(boys_score) else f'   {country} - Boys capable: N/A')

# Calculate bias
ireland_bias = (gender_data[gender_data['Country'] == 'Ireland']['Boys_Technology_Ability'].iloc[0] - 
                gender_data[gender_data['Country'] == 'Ireland']['Girls_Technology_Ability'].iloc[0]) if valid_gender_data and not gender_data[gender_data['Country'] == 'Ireland'][['Girls_Technology_Ability', 'Boys_Technology_Ability']].isna().any().any() else np.nan
india_bias = (gender_data[gender_data['Country'] == 'India']['Boys_Technology_Ability'].iloc[0] - 
              gender_data[gender_data['Country'] == 'India']['Girls_Technology_Ability'].iloc[0]) if valid_gender_data and not gender_data[gender_data['Country'] == 'India'][['Girls_Technology_Ability', 'Boys_Technology_Ability']].isna().any().any() else np.nan

print(f'\n   Gender bias (Boys - Girls scores):')
print(f'   Ireland: {ireland_bias:.1f} points' if not pd.isna(ireland_bias) else '   Ireland: N/A')
print(f'   India: {india_bias:.1f} points\n' if not pd.isna(india_bias) else '   India: N/A\n')

# Key insights
print('4. KEY INSIGHTS:')
ireland_students = students_combined[students_combined['Country']=='Ireland']
india_students = students_combined[students_combined['Country']=='India']

ireland_pu = ireland_students['PU_Score'].mean()
india_pu = india_students['PU_Score'].mean()
ireland_peou = ireland_students['PEOU_Score'].mean()
india_peou = india_students['PEOU_Score'].mean()

print(f'   • {"India" if india_pu > ireland_pu else "Ireland"} students show higher STEM interest')
print(f'   • {"India" if india_peou > ireland_peou else "Ireland"} students show higher coding confidence')
if not pd.isna(ireland_bias) and not pd.isna(india_bias):
    print(f'   • {"Ireland" if abs(ireland_bias) < abs(india_bias) else "India"} shows less gender bias in technology')
    if abs(ireland_bias) > 10 or abs(india_bias) > 10:
        print(f'   • Significant gender bias detected (>10 points difference)')

correlation = stats.pearsonr(students_combined['PEOU_Score'].dropna(), students_combined['PU_Score'].dropna())[0]
print(f'   • Overall correlation between confidence and interest: {correlation:.3f}')
=== TAM ANALYSIS SUMMARY REPORT ===

1. SAMPLE SIZES:
   Students Ireland: 34
   Students India: 45
   Parents Ireland: 33
   Parents India: 44

2. AVERAGE TAM SCORES BY COUNTRY (Students):
        PU_Score        PEOU_Score        Attitude_Score       
            mean    std       mean    std           mean    std
Country                                                        
India      2.933  0.837      2.278  0.743          2.933  0.837
Ireland    2.103  0.860      2.971  0.674          2.103  0.860

3. GENDER STEREOTYPES IN TECHNOLOGY (Parents):
   Average agreement scores (0-100 scale):
   Ireland - Girls capable: 77.4
   Ireland - Boys capable: 78.8
   India - Girls capable: 54.8
   India - Boys capable: 62.0

   Gender bias (Boys - Girls scores):
   Ireland: 1.4 points
   India: 7.2 points

4. KEY INSIGHTS:
   • India students show higher STEM interest
   • Ireland students show higher coding confidence
   • Ireland shows less gender bias in technology
   • Overall correlation between confidence and interest: 0.044